Dual-tree $k$-means with bounded iteration runtime
نویسنده
چکیده
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N , each iteration of Lloyd’s algorithm costs O(kN) time. Although there are existing techniques to accelerate single Lloyd iterations, none of these are tailored to the case of large k, which is increasingly common as dataset sizes grow. We propose a dual-tree algorithm that gives the exact same results as standard k-means; when using cover trees, we use adaptive analysis techniques to, under some assumptions, bound the single-iteration runtime of the algorithm as O(N + k log k). To our knowledge these are the first subO(kN) bounds for exact Lloyd iterations. We then show that this theoretically favorable algorithm performs competitively in practice, especially for large N and k in low dimensions. Further, the algorithm is treeindependent, so any type of tree may be used.
منابع مشابه
Continuous $ k $-Frames and their Dual in Hilbert Spaces
The notion of $k$-frames was recently introduced by Gu avruc ta in Hilbert spaces to study atomic systems with respect to a bounded linear operator. A continuous frame is a family of vectors in a Hilbert space which allows reproductions of arbitrary elements by continuous super positions. In this manuscript, we construct a continuous $k$-frame, so called c$k$-frame along with an atomic system ...
متن کامل$varphi$-CONNES MODULE AMENABILITY OF DUAL BANACH ALGEBRAS
In this paper we define $varphi$-Connes module amenability of a dual Banach algebra $mathcal{A}$ where $varphi$ is a bounded $w_{k^*}$-module homomorphism from $mathcal{A}$ to $mathcal{A}$. We are mainly concerned with the study of $varphi$-module normal virtual diagonals. We show that if $S$ is a weakly cancellative inverse semigroup with subsemigroup $E$ of idemp...
متن کاملPlug-and-play dual-tree algorithm runtime analysis
Numerous machine learning algorithms contain pairwise statistical problems at their core— that is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dual-tree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worstcase runtime guarantees hav...
متن کاملAn Algorithm for Multicast Tree Generation in Networks with Asymmetric Links
W e formulate the problem of multicast tree generation in asymmetric networks as one of computing a directed Steiner tree of minimal cost. We present a new polynomial-time algorithm that provides for tradeoff selection, using a single parameter K , between the tree-cost (Steiner cost) and the runtime efficiency. Using theoretical analysis, we (1 show that it is highly with a performance guarant...
متن کاملBounded approximate connes-amenability of dual Banach algebras
We study the notion of bounded approximate Connes-amenability for dual Banach algebras and characterize this type of algebras in terms of approximate diagonals. We show that bounded approximate Connes-amenability of dual Banach algebras forces them to be unital. For a separable dual Banach algebra, we prove that bounded approximate Connes-amenability implies sequential approximat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1601.03754 شماره
صفحات -
تاریخ انتشار 2016